Statistical-Computational Tradeoffs in Planted Models: The High-Dimensional Setting
نویسندگان
چکیده
The planted models assume that a graph is generated from a set of clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition and planted coloring. This paper studies the statisticalcomputational tradeoffs of these models. Our focus is the high-dimensional setting, where the number of clusters is allowed to grow with the number of nodes. We show that the complexities of cluster recovery exhibit phase transitions. In particular, the space of model parameters can be partitioned into four regions with decreasing statistical and computational complexities: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the exponential-time Maximum Likelihood Estimator (MLE) succeeds; (3) the easy regime, where a polynomial-time convexified MLE succeeds; (4) the simple regime, where a simple algorithm based on counting degrees and common neighbors succeeds. Moreover, each of these algorithms is likely to fail in the harder regime.
منابع مشابه
Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the high-dimensional setting where the number of cluste...
متن کاملStatistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices
We consider two closely related problems: planted clustering and submatrix localization. In the planted clustering problem, a random graph is generated based on an underlying cluster structure of the nodes; the task is to recover these clusters given the graph. The submatrix localization problem concerns locating hidden submatrices with elevated means inside a large real-valued random matrix. O...
متن کاملSharp Computational-Statistical Phase Transitions via Oracle Computational Model
We study the fundamental tradeoffs between computational tractability and statistical accuracy for a general family of hypothesis testing problems with combinatorial structures. Based upon an oracle model of computation, which captures the interactions between algorithms and data, we establish a general lower bound that explicitly connects the minimum testing risk under computational budget con...
متن کاملStatistical and Computational Tradeoffs of Regularized Dantzig-type Estimator∗
Nesterov’s smoothing technique has been widely applied to solve non-smooth optimization problems involving high dimensional statistical models. However, existing theory focuses more on its computational properties rather than statistical properties. This paper bridges this gap by studying a family of regularized Dantzig-type estimators. For these estimators, we show that the smoothing technique...
متن کاملFinding and Leveraging Structure in Learning Problems
The problem of learning from noisy and high dimensional data is an important challenge that has received much attention in the modern machine learning and statistics literature. These problems arise in numerous applications: large scale collaborative filtering, learning gene regulatory networks and genome wide association studies to name a few. This thesis focuses on understanding the statistic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013